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ABSTRACT 

We investigate the impact of statistical and systematic errors on measurements of 
linear redshift-space distortions (RSD) in future cosmological surveys by analysing 
large catalogues of dark-matter halos from the BASICC simulation. These allow us 
to estimate the dependence of errors on typical survey properties, as volume, galaxy 
density and mass (i.e. bias factor) of the adopted tracer. We find that measures of the 
specific growth rate (3 = f /b using the Hamilton/Kaiser harmonic expansion of the 
redshift-space correlation function £{r pi Tt) on scales larger than 3 h~ x Mpc are typi- 
cally under-estimated by up to 10% for galaxy sized halos. This is significantly larger 
than the corresponding statistical errors, which amount to a few percent, indicating 
the importance of non-linear improvements to the Kaiser model, to obtain accurate 
measurements of the growth rate. The systematic error shows a diminishing trend with 
increasing bias value (i.e. mass) of the halos considered. We compare the amplitude 
and trends of statistical errors as a function of survey parameters to predictions ob- 
tained with the Fisher information matrix technique. This is what is usually adopted 
to produce RSD forecasts, based on the FKP prescription for the errors on the power 
spectrum. We show that this produces parameter errors fairly similar to the standard 
deviations from the halo catalogues, provided it is applied to strictly linear scales in 
Fourier space (k < 0.2 h Mpc - ). Finally, we combine our measurements to define and 
calibrate an accurate scaling formula for the relative error on /3 as a function of the 
same parameters, which closely matches the simulation results in all explored regimes. 
This provides a handy and plausibly more realistic alternative to the Fisher matrix 
approach, to quickly and accurately predict statistical errors on RSD expected from 
future surveys. 
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1 INTRODUCTION 
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Galaxy clustering as measured in redshift-space contains the 
imprint of the linear growth rate of structur e f(z), in th e 
form of a measurable large-scale anisotropy (|Kaiserlll987l ). 
This is produced by the coherent peculiar velocity flows 
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towards overdensities, which add an angle-dependent con- 
tribution to the measured redshift. In linear theory, these 
redshift- space distortions (RSD) in the clustering pattern 
can be quantified in terms of the ratio /3(z) = f(z)/b(z) 
(where b is the linear bias of the sample of galaxies con- 
sidered). A value for /3 can be obtained by modeling the 
anisotropy of the redshift-space two-point correlation func- 
tion £(r p , 7r) (where r v and 7r are the separations perpendic- 
ular and parallel to t he line of s i ght) o r, equivalently, of the 
power spectrum (see iHamiftonl (|1998f ) for a review). Since 
b can be defined as the ratio of the rms galaxy clustering 
amplitude to that of the underlying matter, b « a| a /cr™ ass , 
the measured product /3 x erf is equivalent to the pre dicted 
combination f(z) x a^ asa (z) (jSong fc Percivall l2009h . The 
latter is a prediction depending on the gravity theory, once 
normalized to the amplitude of matter fluctuations at the 
given epoch, e.g. using CMB measurements. 

Measurements of the growth rate f(z) are crucial to 
pinpoint the origin of cosmic acceleration, distinguishing 
whether it requires the addition of "dark energy" in 
the cosmic budget, or rather a modification of General 
Relativity. These two radically alternative scenarios are 
degenerate when considering the expansion rate H(z) 
alone, as yielde d, e.g., by the Hu b ble diagram of Type la 



supernova (e.g. iRiess et al.l 1 19981 ; IPerlmutter et al. 1999T) 



or Ba ryonic Acoustic Oscillations (BAO, e.g lPercival et al.l 
120101 ). Although the RSD effect is well known since 
long, its important potential in the context of dark en- 
ergy studies has been fully apprec iated only recently 
ijZhang et al.l 120071 : IGuzzo et ajj|2008h . This led to a true 
renai s sance of interest in this techn i que (|Wa,n g 2008; Lindei 
2008 ; iNesseris fc Perivqlaropoulosl 2008 ; Acquaviva et al 



20081 : ISong fc Percivall 120091 ; IWhite. Song, fc Perciva] 
20091; IPercival fc White] |2009| ; ICabre fc Gaztanagal 120091; 
Blake et al.ll2011f ). such that RSD have quickly become one 



of the most promising probes for future large dark energy 
surveys. This is the case of the rec ently approved ESA 
Euclid mission (JLaureiis et al] l201ll ). which is expected 
to reach statistical errors of a few percent on measure- 
ments of f(z) in several redshift bins out to z = 2 using 
this technique (coupled to similar precisions with the 
complementary weak-lensing experiment). 

In general, forecasts of the statistical precision reachable 
by future projects on the measurements of different cosmo- 
logical parameters have been produced through widespread 
applic ation of the so-c alled Fisher information matrix tech- 
nique (|Tegmarkll 19971 ). This has also been done specifically 
for RSD estimates of the growth rate and related quantities 
dWand I2008J; iLindeJ [20081; IWhite. Song, fc Percivail 120091 ; 



IPercival fc Whitell2009l ; lMcDonald fc Seliakll2009l ). One lim- 
itation of these forecasts is that they necessarily imply some 
idealized assumptions (e.g. on the Gaussian nature of errors) 
and have not been verified, in general, against systematic nu- 
merical tests. This is not easily doable in general, given the 
large size of planned surveys. A first attempt to produce gen- 
er al forecasts ba s ed on numerical experiments was presented 
by IGuzzo et al] (|2008f ) , wh o used mock surveys built from 
the Millennium simulation (jSpringel et al] 12003 ) to numer- 
ically estimate the random and systematic errors affecting 
their measurement of the growth rate from the VIMOS VLT 
Deep Survey. Using a grid of reference survey configurations, 
they calibrated an approximated scaling relation for the rel- 



ative error on j3 as a function of survey volume and mean 
density. The range of parameters explored in this case was 
however limited, and one specific class of galaxies only (i.e. 
bias) was analyzed. 

The second crucial aspect to be taken into consideration 
when evaluating Fisher matrix predictions, is that they only 
consider statistical errors and cannot say anything about the 
importance of systematic effects, i.e. on the accuracy of the 
expected estimates. This is clearly a key issue for projects 
aiming at percent or sub-percent precisions, for which sys- 
tematic errors will be the dominant source of uncertainty. 

In fact, a number of works in recent years sug- 
gest that the standard linear Kaiser description 
of RSD is not sufficiently accurate on quasi-linear 
scales ( ~ 5 — 50 fe -1 Mpc ) where it is routinely ap- 
plied (IScoccimarrd |2004|; iTinker. Weinberg, fc Zhengl 



Taruva. Nishimichi. fc Saitd 



20ld : 



20061; 

Jennings. Baugh. fc Pascolil 1201 ll ). Various non-linear 
corrections are proposed in these papers, the difficulty 
often being their practical implementation in the anal- 
ysis of real data, in par ticular in configuration space 
(|de la Torre fc Guzzol 120121 1. One may hope that in the 
future, with surveys covering much larger volumes, it will 
be possible to limit the analysis to very large scales, where 
the simple linear description s hould be adequa te. Still, 
ongoing surveys like W igglez (|Blake et al] l201ll ), BOSS 
(|Eisenstein et al] l201lT ) and VIPERS (Guzzo et al., in 
preparation), will still need to rely on the clustering signal 
at intermediate scales to model RSD. 

Here, we shall address in a more systematic and ex- 
tended way the impact of random and systematic errors on 
growth rate measurements using RSD in future surveys. We 
shall compare the results directly to Fisher matrix predic- 
tions, thoroughly exploring the dependence of statistical er- 
rors on the survey parameters, including also, in addition 
to volume and density, the bias parameter of the galaxies 
used. This is also relevant, as one could wonder which kind 
of objects would be best suited to measure RSD in a future 
project. These will include using halos of different mass (i.e. 
bias) , up to those traced by groups and clusters of galaxies. 
Potentially, using groups and clusters to measure RSD could 
be particularly interesting in view of m assive galaxy redshif t 
surveys as that expected from Euclid (jLaureiis et al.ll201ll ). 
which can be used to build large catalogues of optically- 
selected clusters with measured redshifts. A similar oppor- 
tunity will be offered by future X-ra y surveys, such as thos e 
expected from the E-Rosita mission (|Cappelluti et al.ll201ll ). 
although in that case, mean cluster redshifts will have to be 
measured first. 

This pap e r is c omplementary to the parallel work of 
iMarulli et al] (|2012T ). where we investigate the impact on 
RDS of redshift errors and explore how to disentangle ge- 
ometrical distortions introduced by the uncertainty of the 
underlying geom etry of the Universe - i .e. the Alcock- 
Paczynski effect (jAlcock fc Paczvnskilll979l ) - on measure- 
ments of RSD. Also, while we were completing our work, 
independent important contri butions in the same dire ction 
appeared in the literature by lOkumura fc Jind (|201lT ) and 
iKwan. Lewis, fc Lindei] (|2012t ). 

The paper is organized as follows. In § [5] we describe 
the simulations used and the mass-selected subsamples we 
defined; in § [3] we discuss the technical tools used to esti- 
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mate and model the two-point correlation function in red- 
shift space, £(r p ,7r), and to estimate the intrinsic values of 
bias and distortion to be used as reference; in §[4] we present 
the measured £(r P ,7r) and show the resulting statistical and 
systematic errors on ft, as a function of the halo bias; here we 
discuss in detail how well objects related to high-bias halos, 
as groups and clusters, can be used to measure RSD; in §[5] 
we organise all our results into a compact analytic formula 
as a function of galaxy density, bias and survey volume; we 
then directly compare these results to the predictions of a 
Fisher matrix code; finally we summarize our results in § [6] 
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2 SIMULATED DATA AND ERROR 
ESTIMATION 

2.1 Halo catalogues from the BASICC 
simulations 

The core of this study is based on the high-resolution 
Baryonic Acoustic-oscillation Simulations at the Institute 
for Co mputational Cosmology (BASICC) of lAngulo et al.l 
(J200Sl l. which used 1448 3 particles of mass 5.49 x 
10 10 /i _1 Mq to follow the growth of structure in dark mat- 
ter in a periodic box of side 1340 h~ Mpc. The simulation 
volume was chosen to allow for growth of fluctuations to be 
modelled accurately on a wide range of scales including those 
of BAO. The very large volume of the box also allows us to 
extract accurate measurements of the clustering of massive 
halos. The mass resolution of the simulation is high enough 
to resolve halos that should host the galaxies expected to 
be seen in forthcoming high-redhift galaxy surveys (as e.g. 
Luminous Red Galaxies in the case of SDSS-III BOSS). 
The cosmological parameters adopted are broadly consistent 
with recent data from the cosmic microwa ve background 
and t he power spectrum of galaxy clustering (| Sanchez et al.l 
120061 ): the matter density parameter is Qm = 0.25, the 
cosmological constant density parameter S1a = 0.75, the 
normalization of density fluctuations, expressed in terms of 
their linear amplitude in spheres of radius 8/i _1 Mpc at the 
present day erg = 0.9, the primordial spectral index n s = 1, 
the dark energy equation of state w — — 1, and the reduced 
Hubble constant h = f/ /(100kms _1 Mpc -1 ) = 0.73. We 
note the high value of normalization of the power spec- 
trum erg, with respec t to more recent WMAP estimates 
(erg = 0.801 ± 0.030, lLarson et alll201ll ) This has no ef- 
fect o n the results discussed here (but see lAngulo fc White! 
1)20101 ) for a method to scale self-consistently the output of a 
simulation to a different background cosmology). Outputs of 
the particle positions and velocities are stored from the sim- 
ulations at selected redshifts. Dark matter halos are identi- 
fied using a Friend s-of-Friends (FOF) percolation algorithm 
(| Davis et al.l 1 19851 ) with a linking length of 0.2 times the 
mean particle separation. Position and velocity are given by 
the values of the center of mass. In this paper, only groups 
with al least N par t = 20 particles are considered (i.e only ha- 
los with mass Mhalo ^ 1.10 x 10 12 h' 1 Mq). This limit pro- 
vides reliable samples in term of their abundance and clus- 
tering, which we checked by comparin g the halo mas s func - 
tion and correlat i on fu nction against Ijenkins et al.l (| 20011 ) 
and lTinker et al.1 (|2010l ) respectively. 

We use the complete catalogue of halos of the simulation 



Table 1. Properties of the halo catalogues used in the analysis. 
N cu t is the threshold value of N par t, e.g. the catalogue N cu t = 20 
is the set of groups (i.e. halos) with at least 20 DM particles; M cu t 
is the corresponding threshold mass; Aftot is the total number of 
halos (i.e. the number of halos with M^ a ; ^ M cu t); n is the 



number density (i.e. n - 
the simulation volume). 



: Aftot /V, where V = 1340 3 /i" 3 Mpc 3 is 



at Z = 1, from which we select sub-samples with different 
mass thresholds (i.e. number of particles). This corresponds 
to samples with different bias values. Table [l] reports the 
main features of these catalogues. In the following we shall 
refer to a given catalogue by its threshold mass M cut (i.e. 
the mass of the least massive halo belonging to that cata- 
logue). We also use the complete dark matter sample (here- 
after DM), including more than 3 x 10 9 particle^J. For each 
catalogue, we split the whole (cubical) box of the simulation 
into Ngput sub-cubes (N Bp u t — 3 unless otherwise stated). 
Each sub-cube ideally represents a different realization of the 
same portion of the Universe, so that we are able to estimate 
the expected precision on a quantity of cosmological interest 
through its scatter among the sub-cubes. Using N sp i it — 3 
is a compromise between having a better statistics from a 
larger number of sub-samples (at the price of not sampling 
some very large scales), and covering even larger scales (with 
N 3p iit = 2), but with fewer statistics. In general, there are 
large-scale modes shared between the sub-cubes. As a conse- 
quence, our assumption that each sub-sample can be treated 
as an independent realization breaks down on such scales. 
To overcome this problem, we limit our analysis to scales 
much smaller than the size of the sub-cubes. 

This analysis concentrates at z — 1, because this is cen- 
tral to the range of redshifts that will become more and 
more explored by surveys of the next generation. This in- 
cludes galaxies, but also surveys of clusters of galaxies, as 
those that should be possible with the eRosita satellite, pos- 
sibly due to launch in 2013. Exploring the expectations from 
RSD studies using high-bias objects, corresponding e.g. to 
groups of galaxies, is one of the main themes of this paper. 



1 Such a number of points involves very long computational times 
when calculating, e.g., a two-point correlation function. To over- 
come this problem, we often use a sparsely sampled sub-set of 
the DM catalogue. In order to limit the impact of shot-noise, we 
nevertheless always keep the DM samples denser than the least 
dense halo catalogue (i.e. M cut = 1.10 X 10 12 h _1 Mq). We veri- 
fied directly on a subset that our results do not effectively depend 
on the level of DM dilution. 
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2.2 Simulating redshift-space observations 

For our measurements we need to simulate redshift-space 
observations. In other words, we have to "observe" the sim- 
ulations as if the only information about the distance of an 
object was given by its redshift. For this purpose we center 
the sample (i.e. one of the sub-cubes) at a distance given by 



D 1 = D{z = 1) = 



H(z') 



dz' 



fli 



^n M + n A (i + z') 3 



-.dz 



(1) 



where the last equality holds for the flat ACDM cosmology 
of the simulation. More explicitly, we transform the positions 
(Xi,Yi,Zi) of an object in a sub-cube of side L, into new 
comoving coordinates 



Lh 



-*< 


Xi 


4 


L 
--< 


Y, 


<-Di 


L 


Zi 


L 



L 
2 



(2) 



where we arbitrarily choose the direction of the Y axis 
for the translation (Z represents a coordinate, not to 
be confused with the redshift z). This procedure assigns 
to each object a comoving distance in real space Di = 
\/Xf + Y? + Zf, hence, inverting Eq. (JTJ, a cosmological 
(undistorted) redshift z,. We then add the Doppler contri- 
bution to obtain the "observed" redshift, as 



Zi = Zi H (1 + Zi) 



(3) 



where w r is the line-of-sight peculiar velocity. Using it in- 
stead of Zi to compute the comoving distance of an object 
gives its redshift-space coordinate. Finally, in order to elim- 
inate the blurring effect introduced at the borders of the 
cube, we trim a slice of 10 /i -1 Mpc from all sides, a value 
about three times larger than typical pairwise velocity dis- 
persion. 



3 MEASURING REDSHIFT-SPACE 
DISTORTIONS 

3.1 Modelling linear and non-linear distortions 

In a fundamental paper. iKaiserl |l987]) showed that, in the 
linear regime, the redshift-space modification of the observed 
clustering pattern due to coherent infall velocities takes a 
simple form in Fourier space: 



P s (fc lW k) = (! + &£) Pn(k) 



(4) 



where P is the power spectrum (subscripts R and S de- 
note respectively quantities in real and redshift space), fik 
is the cosine of the angle between the line of sight and the 
wave vector k and j3 = f/b is the distortion factor, where 
/ = dl ° s - a nd G is the linea r growth factor of density 
perturbations. iHamiltonl (|1992T ) translated this result into 
configuration space (i.e. in terms of correlation function, £): 

£ { s\r P ,it) = &(r)7V/*) +&(r)7> a (A*) +&(r)7>4(i0 , (5) 



where r p and it are the separations perpendicular and par- 
allel to the line of sight, [l, is the cosine of the angle between 
the separation vector and the line of sight [i — cos 9 — n/r, 
Vi are Legendre Polynomials and £; are the multipole mo- 
ments of £,(r p ,n), which can be expressed as 



6>(r) 
6(r) 

where 

C = 

I = 



1+^3 + i/J 2 



(||8+|j9 3 )[e(r)-f(r)] 



35^ 



CM + ^(r) - 7 -l(r) 



£,(t)t 2 dt 



£(t)fdt 



(6) 
(7) 
(8) 

(9) 

(10) 



The superscript L reminds us that Eq. ([5]) holds only in 
linear regime. A full model, accounting for both linear and 
non-linear motions, is obtained empirically, through a con- 
volution with the distribution function of random pairwise 
velocities along the line of sight <fi(v): 



£s(r P ,ir) 






v(l + z) 



H(z) 



<p(v)dv , (11) 



where z is the redshif t and H(z) is the Hubble functio n 
ijPavis fc Peebles! 1 1983! : iFisher et all Il994l : IPeacockl Il999l ). 
We represent ip(v) by an exponential form , consistent 
with observations and N-body simulations (e.g. IZurek et al.l 
Gil), 



<p(v) 



C12 



V2 



V2\v\ 

(Tl2 



(12) 



where o\2 is a pairwise velocity dipersion. We note in pass- 
ing that the use of a Gaussian form for tp(v) is in some cases 
to be preferred, as e.g. when large redshift measurement er- 
rors affects the catalogues to be analyzed. This is discussed 
in detail in lMarulli et al.l (|2012T ) Hereafter we shall refer to 
Eq. © and Eq. (|ll|l as the linear and linear-exponential 
model, respectively. Moreover, in order to simplify the nota- 
tions, we shall refer to the real- and redshift-space correla- 
tion functions just as £(r) and £(r p , it) respectively, removing 
the subscripts R and S. 



3.2 Fitting the redshift-space correlation function 

We can estimate j3 (and o\2, for the linear-exponential 
model) through this modelling, by minimizing the follow- 
ing x 2 function over a spatial grid: 



-21n£ = ^ 



,(".) 



(Vij -Vij) 



(13) 



where C is the likelihood and we have defined the quantity 
««=log[l + £(r„. ) jr,-)]. (14) 

Here the superscript m indicates the model and 5fj repre- 
sents the variance of ytj. The use of log(l + ^) in Eq. I)14|) 
has the advantage of placing more weight on large (linear) 
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Figure 1. Left: the real-space correlation functions of the halo catalogues, compared to that of the dark-matter particles in the BASICC 
simulation. Right: the ratio of £,halo{ r ) anc I ^dm(x) f° r each catalogue, with the resulting best-fit linear bias 6 2 = £,halo( r )/£,DM ( r ) = 
const, fitted over the range 10 < r < 50 h^ 1 Mpc. Error bars correspond to the standard deviation (of the mean) over 27 sub-cubes. 



scales (JHawkins et al.ll2003f ). However, unlike lHawkins et al.l 
IJ2003T) , we sim ply use the sample variance of yij to esti- 
mate 5ij (as in iGuzzo et al.l 120081 ). We show in Appendix 
[Al that this definition provides more stable estimates of /3 
also in the low-density regime. The correlation functions 
are measured using t he minimum variance estimator of 
lLandv feS zalav ( 199 J) . W e t ested di ff erent estim ators, such 
as iDavis fc Peebles! (| 19831 ), iHewettl (|l982l ) and iHamiltonl 
(|1993T ) . finding that our measurements are virtually insensi- 
tive to the estimator choice, at least for r < 50 h~ Mpc. For 
the linear-exponential model, we perform a two-parameter 
fit, including the velocity dispersion, da, as a free param- 
eter. However, being our interest here focused on measure- 
ments of the growth rate (through f3), o\2 is treated merely 



as an extra parameter to (potentially) account for deviations 
from linear theorjQ. 

Finally, in performing the fit we have neglected an im- 
portant aspect, but for good reasons. In principle, we should 
consider that the bins of the correlation function are not 
independent. As such, Eq. (|13|l should be modified as to 
include also the contribution of non-diagonal terms in the 
covariance matrix, i.e. (in matrix form) 



21n£ 



r(ra) 



r{m) 



(15) 



2 See, for instance, IScoccimarrol 1 120041 ) for a detailed discussion 
about the physical meaning of 012 . 
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10 

r[Mpc/h] 

Figure 2. The expected the bias factor, expressed as b 2 = 
£halo( r ) I £d m (r) , plotted over a wider range of separations than 
in the previous figure. Dashed lines are obtained by fitting a 
constant bias model over the range denoted by the grey area, 
10 < r < 50 h^ 1 Mpc. Error bars give the standard deviation of 
the mean over the 27 sub-cubes. 



where Y and Y' m ' are two (column) vectors containing all 
data and model values respectively (with dimension N 2 , 
where Nb is the number of bins in one dimension used to 
estimate £(r p , n)), whereas C is the covariance matrix, with 
dimension N 2 x N 2 . 

This is routinely used w hen fitting ID correlation func- 
tions fe.g. lFisher et al.lll994T ). but it becomes arduous in the 
case of the full £(r p ,n), for which Nt ~ 100 and the covari- 
ance matrix has « 10 8 elements. What happens in practice, 
is that the estimated functions are over- sampled, so that 
the effective number of degrees of freedom in the data is 
smaller than the number of components in the covariance 
matrix, which is then singular. Still, a test with as many 
as 100 blockwise boostrap realizations yields a very un- 
satisfactory covariance matrix. We tested on a smaller-size 
£(r p ,7r) the actual effect of assuming negligible off-diagonal 
elements in the covariance matrix, obtaining a difference of 
a few percent in the measure d value of /3, as also found 
in Ide la Torre fc Guzzol (|2012l ). Part of this insensitivity is 
certainly related to the very large volumes of the mock sam- 
ples, with respect to the scales involved in the parameter 
estimations. This corroborates our forced choice of ignoring 
covariances in the present work, also because of the compu- 
tational time involved in inverting such large matrices, size 
multiplied by the huge number of estimates needed for the 
present work. 



3.3 Reference distortion parameters and bias 
values of the simulated samples 

Before measuring the amplitude of redshift distortions in the 
various samples described above, we need to establish the 
reference values to which our measurements will be com- 
pared, in order to identify systematic effects. Specifically, 
we need to determine with the highest possible accuracy 
the intrinsic "true" value of j3 for all mass-selected sam- 
ples i n the simulat i on. This can be obtained from the rela- 
tion (|Peebles|[l980l ; IfwI|1985| ; iLightman fc Schechterl [l99(il ; 



IWang fc Steinharddll998h 



IK*) 



b(z) 



(16) 



where, f(z) = Q, ( i? 5 (z) is the growth rate of fluctuations at 
the given redshiftQ For the flat cosmology of the simulation 
Qm(z) is 



£Im(z) 



(1 + z) fi* 



(1 + z) 3 Q M0 + (1 - Sljuo) 
The linear bias can be estimated as 

,2 _ (,halo{r) 

t,DM(r) ' 



(17) 



(18) 



Here £haio and £dm have to be evaluated at large sepa- 
rations, r > 10 7i Mpc, where the linear approximation 
holds. In the following we shall adopt the notation bt and 
j3t for the values thus obtained. To recover the bias and 
its error for each M cu t listed in Table [l] we split each cu- 
bic catalogue of halos into 27 sub-cubes. Figure Q] shows 
the measured two-point correlation functions and the cor- 
responding bias values for the various sub-samples. These 
are computed at different separations r, as the average over 
27 sub-cubes, with error bars corresponding to the standard 
deviation of the mean. Dashed lines give the corresponding 
value of b 2 , obtained by fitting a constant over the range 
10 < r < 50 ft, -1 Mpc. In most cases, the bias functions 
show a similar scale dependence, but the fluctuations are 
compatible with scale-independence within the error bars 
(in particular for halo masses M cut < 1.70 x 10 13 h^ 1 M ). 
For completeness, in Figure [2] we show that this remains 
valid on larger scales (r > 50 /i -1 Mpc, whereas on small 
scales (r < 10/i _1 Mpc), a significant scale-dependence is 
present. The linear bias assumption is therefore acceptable 
for r > 10 /T 1 Mpc. 

In a realistic scenario, /3 is measured from a redshift 
survey. Then the growth rate is recovered as / = bj3. 
Unfortunately in a real survey it is not possible to esti- 
mate b through Eq. (|18|l as we described above (and as 
it is done for dark matter simulations) since the real ob- 
servable is the two-point correlation function of galaxies, 
whereas (,dm cannot be directly observed. A possible so- 
lution is to assume a model for the dependence of the 
bias on the mass. Using groups/clusters in this context 
may be convenient as their total (DM) mass can be esti- 
mated from the X-ray emission temperature or luminosity. 
We compare our directly measured b with those calcu lated 
from two popular m odels: ISheth. Mo. fc Tormenl (|2001f ) and 
iTinker et ail (|2010t ) (hereafter SMT01 and T+10), in Fig- 
ure [3] Details on how we comp ute bsMTOi and fer+ io are re- 
ported in the parallel paper bv lMarulli et al.l (|2012i ). We see 
that for small/intermediate masses our measurements are 
in good agreement with T+10, whereas for larger masses, 
M cut > 2 x 10 13 /i" 1 M Q , SMT01 yields a more reliable 
prediction of the bias. 



3 In this section we adopt the notation Qm = &M (z) and f2jv/o 
Qm (z = 0), not to be confused with the notation Qm = &m(z 
0) adopted elsewhere in this work. 
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Figure 3. Comparison of the bias values measured from the simu- 
lated catalogues as a function of their threshold mass, M cu t, with 
the predictions of the SMT01 and T+10 models. The top axis also 
reports the number of particles per halo, N cu t, corresponding to 
the catalogue threshold mass. 



4 SYSTEMATIC ERRORS IN 

MEASUREMENTS OF THE GROWTH RATE 

4.1 Fitting the linear-exponential model 

As in the previous section, we split each of the 12 mass- 
selected halo catalogues of Table [l] into 27 sub-cubes. Then 
we compute the redshift-space correlation function £(r p ,7r) 
for each of them. Figure [4] gives an example of three cases 
of different mass. Following the procedure described in Sec- 
tion !3.2l we obtain an estimate of the distortion parameter /3. 
The 27 values of /3 are then used to estimate the mean value 
and standard deviation of j3 as a function of the mass thresh- 
old (i.e. bias). With the adopted setup (binning and range), 
the fit becomes unstable for M cut > 3 x 10 13 h~ x Mq, in the 
sense of yielding highly fluctuating values for /3 and its scat- 
ter. Very probably, this is due to the increasing sparseness 
of the samples and the reduced amplitude of the distortion 
(since /3 oc l/b). Figure [4] explicitly shows these two effects: 
when the mass grows (top to bottom panels) the shot-noise, 
which depends on the number density, increases, whereas 
the compression along the line of sight decreases, since it 
depends on the amplitude of /3. For this reason, in this work 
we consider only catalogues below this mass threshold, as 
listed in Table [fl 

Figure [S] summarizes our results. The plot shows the 
mean values of /3 for each mass sample, together with their 
confidence intervals (obtained from the scatter of the sub- 
cubes), compared to the expected values of the simulation 
Pt (also plotted with their uncertainties, due to the error 
on the measured bias bt, Section \3. 31) . These have been ob- 
tained using the linear-exponential model, Eq. (| 1 If) . which 
represents the standard approach in previous works, fitting 
over the range 3 < r p < 35/i _1 Mpc, < it < 35/i _1 Mpc 
with linear bins of 0.5/i -1 Mpc. We also remark that here 
the model is built using the "true" £(r) measured directly 
in real-space, which is not directly observable in the case of 
real data. This is done as to clearly separate the limitations 
depending on the linear assumption, from those introduced 
by a limited recontruction of the underlying real-space cor- 
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Figure 4. £(r p ,7r) for the catalogues with M cu t = 1.10 X 
10 12 h' 1 M e (upper panel), M cut = 9.99 X 10 12 h' 1 M Q (cen- 
tral panel) and M cu t = 3.00 X 10 13 /i" 1 Mq (lower panel). Iso- 
correlation contours of the data are shown in cyan, whereas the 
best fit model corresponds to the black curves. Note that the color 
scale and contour levels differ in the three panels. The latter are 
arbitrarily set to {0.07, 0.13, 0.35, 1}, {0.15, 0.3, 0.7, 2.8} and 
{0.25, 0.5, 1.3, 5} respectively from top to bottom. When the 
mass grows, the distortion parameter f3 (i.e. the compression of 
the pattern along the line of sight) decreases, whereas the corre- 
lation and the shot-noise increase. 
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relation function. In Appendix [B] we shall therefore discuss 
separately the effects of deriving £ (r) directly from the ob- 
servations. 

Despite the apparently very good fits (Fig. 2}, we find a 
systematic discrepancy between the measured and the true 
value of fi. The systematic error is maximum (« 10%) for 
low-bias (i.e. low mass) halos and tends to decrease for larger 
values (note that here with "low bias" we indicate galaxy- 
sized halos with M ss 10 12 h~ x Mq). In particular for M cu t 
between 7 x 10 12 and ~ 10 13 hT 1 Mq the expectation value 
of the measurement is very close to the true value fit ■ 

It is interesting, and somewhat surprising, that, al- 
though massive halos are intrinsically sparser (and hence 
disfavoured from a statistical point of view), the scatter of 
fi (i.e. the width of the green error corridor in Figure[S} does 
not increase in absolute terms, showing little dependence on 
the halo mass. Since the value of fi is decreasing, however, 
the relative error does have a dependence on the bias, as we 
shall better discuss in § 



4.2 Is a pure Kaiser model preferable for 
cluster-sized halos? 

Groups and clusters would seem to be natural candidates to 
trace large-scale motions based on a purely linear descrip- 
tion, since they essentially trace very large scales and most 
non-linear velocities are confined within their structure. Us- 
ing clusters as test particles (i.e. ignoring their internal de- 
grees of freedom) we are probing mostly linear, coherent mo- 
tions. It makes sense therefore to repeat our measurements 
using the linear model alone, without exponential damping 
correction. The results are shown in Figure [6] The relative 
error (lower panel) obtained in this case is in general smaller 
than when the exponential damping is included. This is a 
consequence of the fact that the linear model depends only 
on one free parameter, fi, whereas the linear-exponential 
model depends on two free parameters, fi and <Ti2. Both 
models yield similar systematic error (central panel) , except 
for the lower mass cutoff range where the exponential correc- 
tion clearly has a beneficial effect. In the following we briefly 
summarize how relative and systematic errors combine. To 
do this we consider three different mass ranges arbitrarily 
choosen. 

(i) Small masses (M cu t <5x 10 12 Ir 1 Mq) 
This range corresponds to halos hosting single L* galaxies. 
Here the linear exponential model, which gives a smaller 
systematic error, is still not able to recover the expected 
value of fi. However, any consideration about these "galactic 
halos" may not be fully realistic since our halo catalogues 
are lacking in sub-structure (see Section \4. 41) . 

(ii) Intermediate masses 

(5 x 10 12 < M cu t < 2 x 10 13 h' 1 Mq) 
This range corresponds to halos hosting very massive galax- 
ies and groups. The systematic error is small compared to 
that of the other mass ranges, for both models. This means 
that we are free to use the linear model, which always gives 
a smaller statistical error (lower panel), without having to 
worry too much about its systematic error, which in any 
case is not larger than that of the more complex model. In 
particular, we notice that using the simple linear model in 
this mass range, the statistical error on fi is comparable to 



that obtained with a galaxy-mass sample using the more 
phenomenological linear-exponential model. This may be a 
reason for preferring the use of this mass range for measuring 

fi- 

(iii) Large masses (M cut >2x 10 13 h' 1 Mq) 
This range corresponds to halos hosting what we may de- 
scribe as large groups or small clusters. The random error 
increases rapidly with mass (Figure [6] lower panel), regard- 
less of the model, due to the reduction of the distortion 
signal (fi oc 1/6) and to the decreasing number density. 



4.3 Origin of the systematic errors 

The results of the previous two sections are not fully 
unexpected. It has been evidenced in a number of recent 
papers that the standard linear Kaiser description of 
RSD, Eq. 0, is not sufficiently accurate on the quasi- 
linear sca les (~ 5 4- 5 h~ Mpc) where it is normally 
appli ed (IScoccimarrd 2004; iTinker. Weinberg, fc Zheng 
20061; iTaruva. Nishimichi. fc Saitol J2010J : 

Jind |201 ll : 



Jennings, Baugh, & Pascoli 2 0111 ; lOkumura , 
Kwan. Lewis, fc Linden [2012]) . This involves not only the 
linear model, but also what we calle d the linear-exponentia l 
model. Since the pioneering work of lDavis fc Peebles! (|1983T I 
the exponential factor is meant to include the small-scale 
non-linear motions, but this is in fact empirical and 
only partially compensates for the inaccurate non-linear 
description. The systematic error we quantified with our 
simulations is thus most plausibly interpreted as due to the 
inadequacy of this model on such scales. Various improved 
non-linear corrections are proposed in the quoted papers, 
although their performance in the case of real galaxies 
still requires further refinement (e.g. Ide la Torre fc Guzzd 
|2012j). On the other hand, considering larger and larger 
(i.e. more linear) scales, one would expect to converge 
to the Kaiser limit. In this regime, however, other diffi- 
culties emerge, as specifically the low clustering signal, 
the nee d to model the BAO peak and the wide-angle 
effects (|Samushia. Percival. fc Raccanellil |2012J). We have 
explored this, although not in a systematic way. We find no 
indication for a positive trend in the sense of a reduction 
of the systematic error when increasing the minimum scale 
r-min included in the fit, at least for r m i n = 20 h' 1 Mpc. 
Systematic errors remain present, while the statistical error 
increases dramatically. The situation improves only in a 
relative sense, because statistical error bars become larger 
than the system atic error. This is seen in mo re detail in the 
parallel work by Ide la Torre fc Guzzd (|2012T l. Finally, it is 
interesting to remark the indication that systematic errors 
can be reduced by using the Kaiser model on objects that 
are intrinsically more suitable for a fully linear description. 



4.4 Role of sub-structure: analysis of the 
Millennium mocks 

In the simulated catalogues we use here, sub-structures in- 
side halos, i.e. sub-halos, are not resolved, due to the use of 
a single linking length when running the Friends-of-Friends 
algorithm (Section 12. 1|) . As such, the catalogues do not in 
fact reproduce correctly the small-scale dynamics observed 
in real surveys. Although we expect that our fit (limited to 
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Figure 5. The mean values of fi averaged over 27 sub-cubes, as measured in each mass sample (open circles) estimated using the 
"standard" linear-exponential model of Eq. JTTJ. The dark- and light-green bands give respectively the la and 3a confidence intervals 
around the mean. The measured values are compared to the expected values fit, computed using Eqs. H16I18I I. We also give the la and 
3cr theoretical uncertainty around fit, due to the uncertainty in the bias estimate ( brown and red bands, respectively). 



scales r p > 3 ft Mpc) is not directly sensitive to what hap- 
pens on the small scales where cluster dynamics dominate, 
we have decided to perform here a simple direct check of 
whether these limitations might play a role on the results 
obtained. Essentially, we want to understand if the absence 
of sub-structure could be responsible for the enhanced sys- 
tematic error we found for the low-mass halos. 

To this end, we further analysed 100 Millennium mock 
surveys. These are obtained by combining the output of 
the pure dark-matter Millennium run (Springel et al. 2005) 
with the Munich semi-an alytic model of galaxy formation 
|De Lucia fc Blaizotll2007h . The Millennium Run is a large 
dark matter N-body simulation which traces the hierarchi- 
cal evolution of 2160 3 particles between z = 127 and 2 = 
in a cubic volume of 500 3 ft -3 Mpc 3 , using the same cosmol- 
ogy of the BASICC simulation (Qm, ^a, fit, h, n, as) — 
(0.25, 0.75, 0.045, 0.73, 1, 0.9). The mass resolution, 
8.6 x 10 8 h~ Mq allows one to resolve halos containing 
galaxies with a luminosity o f 0.1L* with a minim um of 100 
particles. Details are given in lSpringel et alj (|2005l ). The one 
hundred mocks reproduc e the geometry of t he VVDS-Wide 
"F22" survey analysed in lGuzzo et al.l (|2008r ) (except for the 
fact that we use complete samples, i.e. with no angular selec- 
tion function), covering 2x2 deg 2 and 0.7 < z < 1.3. Clearly, 
these samples are significantly smaller than the halo cata- 
logues built from the BASICC simulations, yet they describe 
galaxies in a more realistic way and allow us to study what 
happens on small scales. In addition, while the BASICC halo 
catalogues are characterized by a well-defined mass thresh- 
old, the Millennium mocks are meant to reproduce the se- 



lection function of an Iab < 22.5 magnitude-limited survey 
like VVDS-Wide. From each of the 100 light cones, we fur- 
ther consider only galaxies lying at 0.7 < z < 1.3 to have a 
median redshift close to unity. The combination of these two 
sets of simulations should hopefully provide us with enough 
information to disentangle real effects from artifacts. 

Performing the same kind of analysis applied to the 
BASICC halo catalogues (Figure [TJl , we find a comparable 
systematic error, corresponding to an under-estimate of /3 
by 10%. We recover j3 — 0.577 ± 0.018, against an expected 
value of fit — 0.636 ± 0.006, suggesting that our main con- 
clusions are substantially unaffected by the limited descrip- 
tion of sub-halos in the BASICC samples. Another potential 
source of systematic errors in the larger simulations could 
be resolution: the dynamics of the smaller halos could be 
unrealistic simply because they contain too few dark-matter 
p articles. Our result s from the Millennium mocks and those 
of lOkumura fc Jind (|201lr ). which explicitly tested for such 
effects, seem however to exclude this possibility. 



5 FORECASTING STATISTICAL ERRORS IN 
FUTURE SURVEYS 

A galaxy redshift survey can be essentially characterized by 
its volume V and the number density, n, and bias factor, 
b, of the galaxy population it includes (besides more spe- 
cific effects due to sample geometry or selection criteria). 
The precision in determining f3 depends on these parame- 
ters. Using mock samples from the Millennium run similar 
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Figure 6. Comparison of the performances of the linear and linear-exponential models. Upper panel: measurements of from the 
different halo catalogues, obtained wth the linear model of Eq. J5} (squares) and the linear-exponential model of Eq. (lilt (trianglcsl) . 
Mean values and errors are computed as in Fig. [5] from the 27 sub-cubes of each catalogue. We also plot the expected values of /3 from 
the simulation, fit = f/bt (i.e. j3 "true") and from the models of Fig. [3] Pt+10 = f/^T+10 an d ftsMTOl = f/bsMT01- Central panel: 
relative systematic error. Lower panel: relative statistical error. 



to those used here, iGuzzo et alj (|2008l ) calibrated a simple 
scaling relation for the relative error on /3, for a sample with 
b= 1.3: 



6(0) ^ 50 

~jT ~ n°- 44 V - 5 



(19) 



While 



jaring 



a general agreement has 

this relation to Fisher 

I2009T ) 



been found corn- 
matrix predictions 
this formula was strictly 



r' 

(|White. Song, fc Perciva] 
valid for the limited density and volume ranges origi- 
nally covered in that work. For example, the power-law 
dependence on the density cannot realistically be ex- 
tended to arbitrarily high densities, as also pointed out 



by ISimpson fc Peacock! (|201(J ). In this section we present 
the results of a more systematic investigation, exploring in 
more detail the scaling of errors when varying the survey 
parameters. This will include also the dependence on 
the bias factor of the galaxy population. In general, this 
approach is expected to provide a description of the error 
budget which is superior to a Fisher matrix analysis, as it 
does not make any specific assumption on the nature of the 
errors. All model fits presented in the following sections 
are performed using the real-space correlation function 
£(r) recovered from the "observed" £(r p ,n). This is done 
through the projection/de-projection procedure described 
in Appendix |B1 (with 7v max — 25 /i~ 1 Mpc), which as we 
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Figure 7. £(r p ,ir) for the Millennium mocks. The coding is the 
same as in Fig. [4] with iso-correlation contours arbitrarily set to 
{0.05, 0.1, 0.25, 1}. 



show increases the statistical error by a factor around 2. 
The goal here is clearly to be as close as possible to the 
analysis of a real data set. 



5.1 An improved scaling formula 

In doing this exercise, a specific problem is that, as shown 
in Table [l] catalogues with larger mass (i.e. higher bias) are 
also less dense. Our aim is to separate the dependence of the 
errors on these two variables. To do so, once a population of 
a given bias is defined by choosing a given mass threshold, 
we construct a series of diluted samples obtained by ran- 
domly removing objects. The process is repeated down to a 
minimum density of 6.87 x 10 -5 h 3 Mpc -3 , at which shot 
noise dominates and for the least massive halos the recov- 
ered /3 is consistent with zero. In this way, we obtain a series 
of sub-samples of varying density for fixed bias, as reported 
in Table [2] The full samples are the same used to build, e.g., 
Figure [S] 

In Figure [5] we plot the relative errors on /3 measured 
from each catalogue of Table [2] as a function of the bias 
factor and the number density. These 3D plots are meant to 
provide an overview of the global behavior of the errors; a 
more detailed description is provided in Figures [TTJITlJ where 
2D sections along n and b are reported. For all the samples 
considered, the volume is held fixed. 

As shown by the figure, the bias dependence is weak 
and approximately described by 8(/3)/f3 oc 6 ' 7 , i.e. the error 
is slightly larger for higher-bias objects. This indicates that 
the gain of a stronger clustering signal is more than cancelled 
by the reduction of the distortion signal, when higher bias 
objects are considered. This is however fully true only for 
samples which are not too sparse intrinsically. We see in fact 
that at extremely low densities, the relationship is inverted, 
with high-bias objects becoming favoured. At the same time, 
there is a clear general flattening of the dependence of the 
error on the mean density n. The relation is not a simple 



power-law, but becomes constant at high values of n. In 
comparison, over the density range considered here, the old 
scaling formula of Guzzo et al. would overestimate the error 
significantly. This behaviour is easily interpreted as showing 
the transition from a shot-noise dominated regime at low 
densities to a cosmic-variance dominated one, in which there 
is no gain in further increasing the sampling. Such behaviour 
is clear for low-mass halos (i.e. low bias) but is much weaker 
for more massive, intrinsically rare objects. 

We can now try to model an improved empirical relation 
to reproduce quantitatively these observed dependences. Let 
us first consider the general trend, 5(/3)/f3 oc fa ' 7 , which de- 
scribes well the trend of 8(f3)//3 in the cosmic variance domi- 
nated region (i.e. at high density). In Figure[8]such a power- 
law is represented by a plane. We then need a function capa- 
ble to warp the plane in the low density region, where the rel- 
ative error becomes shot-noise dominated. The best choice 
seems to be an exponential: 8(f3)/f3 oc fc 0,7 exp(no/n), where, 
by construction, no roughly corresponds to the threshold 
density above which cosmic variance dominates. Finally, we 
need to add an exponential dependence on the bias so that at 
low density the relative error decreases with b, such that the 
full expression becomes 5(/3)//3 oc b ' 7 exp[no/(fe 2 n)]. The 
grid shown in Figure [S] represents the result of a direct fit of 
this functional form to the data, showing that it is indeed 
well suited to describe the overall behaviour. In the right 
panel we have oriented the axes as to highlight the goodness 
of the fit: the rms of the residual between model and data 
is « 0.015, which is an order of magnitude smaller than the 
smallest measured values of <5(/3)//3. This gives our equation 
the predictive power we were looking for: if we use it to pro- 
duce forecasts of the precision of f3 for a given survey, we 
shall commit a negligible erroqj (< 20%) on 5(/3)/f3 (at least 
for values of bias and volume within the ranges tested here) . 
To fully complete the relation, we only need to add the de- 
pendence on the volume, which is in principle the easiest. 
To this end, we split the whole simulation cube into N 3 pUt 
sub-cubes, with N sp i it = 3, 4, 5, 6. By applying this proce- 
dure to 5 samples with different bias and number density 
(see Table [2j we make sure that our results do not depend 
on the particular choice of bias and density. Figure [5] shows 
that 8(f3)/f3 oc V~ ' 5 ind ependently of n and b, confirming 
the dependence found bv lGuzzo et al.l (|2008l ) . We can thus 
finally write the full scaling formula for the relative error of 
j3 we were seeking for 



5(/3)//3«Cfe - 7 y-°- 5 exp 



no 
b 2 n 



(20) 



where n = 1.7 x 10~ 4 h 6 Mpc~ J and C = 4.9 x 
l2 ft -1 ' 5 Mpc 1 ' 5 . Clearly, by construction, this scaling for- 



10- 



mula quantifies random errors, not the systematic ones. 



5.2 Comparison to Fisher matrix predictions 

The Fisher information matrix provides a method for 
determining the sensitivity of a particular experiment to a 
set of parameters and has been widely used in cosmology. 



4 This estimate is obtained by comparing the smallest measured 
error, <5(/3)//3 rj 0.07 (Figure llOH . with the rms of the residuals, 
ra 0.015. 
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Figure 9. Relative error on as a function of volume, bias and number density. The dependence on volume is explored by dividing the 

put = 3, 4, 5, 6. As in all of this section, in modelling the measured S;(r p , n) through Eq. Hilt we 
_1 Mpc), as to represent a condition as close as possible to real observations. The superimposed 
Eq. 120P . Left panel: <5(/3)//3t as a function of volume and bias, considering three different 



split 



use the deprojected £(r) (with ir ma x = 25 h~ 

grid is described by the scaling formula of ". 

threshold masses (i.e. biases), but randomly diluting the catalogues as to keep a constant number density, n = 2.48 X 10~ 4 h [i Mpc~ 

all cases (see Table [2] empty circles). Right 

a single threshold mass, M cu t = 1.10 X 10 12 



panel: 5(/3)//3t as a function of the volume, V, 
h —1 Mq, corresponding to a constant bias, b 



and the number density, 
= 1.44. 



n. Here we consider 
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Table 2. Properties of the diluted sub-samples constructed to test the dependence of the error of on bias and mean density. Each 
entry in the table is uniquely defined by a pair (M cu t,n); moving along rows or columns the samples keep a fixed bias (mass threshold) 
or density, respectively. Bias values are explicitly reported at the right-hand side of the table. The diagonal coincides with the full (i.e. 
non-diluted) samples. Empty circles indicate catalogues which have been used also to test the dependence on the volume: they have been 
split into TV 3 ,., sub-samples for N sp i it = 3, 4, 5, 6, whereas all other catalogues (filled circles) use N sp m = 3 only for the sake of building 
statistical quantities. 
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Figure 10. The relative error on /3 as a function of the mean number density of the sample, predicted with the Fisher matrix approach 
(solid and dotted lines) and measured from the simulated samples (filled circles; colours coded as in previous figures). The solid and 
dotted lines correspond to using respectively k max = 0.2 h Mpc -1 or k max = 1 h Mpc" 1 (with Lorcntzian damping) in the Fisher 
forecasts. The dashed lines show in addition the behaviour of the scaling formula obtained from the simulation results (Eq. II20H ), This 
is also compared, in the top-left panel, to the old simplified fitting formula for b = 1.3 galaxies of Eq. 119> . 



In particular, iTegmarkl (|1997m introduced an implementa- 
tion of the Fisher matrix aimed at forecasting errors on 
cosmological parameters derived from the galaxy power 
spectrum P(k), based on its expected observational uncer- 
tainty, as described by iFeldman. Kaiser, fc Peacock (| 19941 . 
FKP). This was adapted by ISeo fc Eisensteinl i|200ot ) to 
the measurements of distances using the baryonic acoustic 
oscillations in P(k). Following the renewed interest in RSD, 



over the past few years the Fisher matrix technique has 
also been applied to predict t he errors ex p ected on j3 , 
f and related pa r amete rs (e.g |Li ndcr 2008: Wane 2008; 
Percival fc White! [20091; I White. Song, fc Percivall |2009|; 



Simpson fc Peacock! 20101; Wang et al.ll2010l; Samushia et alj 
20111; iBueno Belloso. Garcia-Bellido. fc Saponel 1201 ll ; 
di Porto. Amendola. fc Branchinil I2012T I. The extensive 
simulations performed here provides us with a natural op- 
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Figure 11. The relative error on as a function of the effective bias factor, predicted by the Fisher matrix (solid and dotted lines) and 
measured from the simulated samples (filled circles; colours coded as in previous figures). The solid and dotted lines correspond to using 
respectively k max = 0.2 h Mpc -1 or k max = 1 h Mpc -1 (with Lorentzian damping) in the Fisher forecasts. The dashed lines show in 
addition the behaviour of the scaling formula obtained from the simulation results (Eq. H20H 1 ), 



portunity to perform a first simple and direct test of these 
predictions. Given the number of details that enter in the 
Fisher matrix implementation, this cannot be considered as 
exhaustive. Yet, a number of interesting indications emerge, 
as we shall see. 

We have computed Fisher m atrices for all catalogues 
in Tab le [51 using a code following IWhite. Song, fc Percivall 
(2009). In particular, our Fisher matrix predicts errors on /3 
and b , given the errors on the linear redshi ft space power 
spectrum modeled as in Eq. (g]) (|Kaiserlll987i ). We first limit 
the computations to linear scales, applying the standard cut- 
off k < km ax — 0.2 h Mpc -1 . We also explore the possibility 
of including wavenumbers as large as k = 7r/3 ~ 1 h Mpc - 
(that should better match the typical scales we fit in the 
correlation functions from the simulations), accounting for 
non-linearity through a conventional small-scale Lorentzian 
damping term. Our fiducial cosmology corresponds to that 
used in the simulation, i.e. Qm = 0.25, Oa = 0.75, Ho = 0.73 
and as — 0.9 today. We also choose an — 200 km s _1 
as reference value for the pair wise dispersion. We do no t 
consider geometric distortions (|Alcock fc Paczvnskilll979l ). 
whose impact on RS D is addressed in the parallel paper by 
iMarulli et al.l (|2012i ). To obtain the Fisher predictions on 
P, we marginalize over the bias, to account for the uncer- 



tainty on its precise value, and on the pairwise velocity in 
the damping term (when present). 

Figure [TO] shows the measured relative errors on /3 as a 
function of the number density, compared to the Fisher fore- 
casts for the two choices of k max . We also plot the scaling 
relation from Eq. (|20|) . which best represents the simulation 
results. We see that the simulation results are in in fairly 
good agreement with the Fisher predictions, when we limit 
the computation to very linear scales in the power spectrum 
(solid line). The inclusion of higher wavenumbers produces 
unrealistically small errors and with a wrong dependence 
on the number density. Both the solid lines and points re- 
produce the observed flattening at high number densities, 
which corresponds to the transition between a shot-noise 
and a cosmic-variance dominated regime, respectively. 

Similarly, Figure [TT] looks at the dependence of the er- 
ror on the linear bias parameter, comparing the simulation 
results (points and scaling formula best-fit) to the Fisher 
forecasts. The behaviour is similar to that observed for the 
number density: there is a a fairly good agreement when the 
Fisher predictions are computed using k max = 0.2 h Mpc -1 , 
except for very low values of the number density and the 
bias. Again, when non-linear scales are included, the Fisher 
predictions become too optimistic by a large factor. 
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6 SUMMARY AND DISCUSSION 

We have performed an extensive investigation of statisti- 
cal and systematic errors in measurements of the redshift- 
distortion parameter j3 from future surveys. We have con- 
sidered tracers of the large-scale distribution of mass with 
varying levels of bias, corresponding to objects like galax- 
ies, groups and clusters. To this purpose, we have analyzed 
large catalogues of dark-matter halos extracted from a snap- 
shot of the BASICC simulation at z = 1. Our results clearly 
evidence the limitations of the linear description of redshift- 
space distortions, showing how errors depend on the typi- 
cal survey properties (volume and number density) and the 
properties of the tracers (bias, i.e. typical mass). Let us re- 
cap them and discuss their main implications. 

• Estimating f} using the Hamilton/Kaiser harmonic ex- 
pansion of the redshift-space correlation function £,(r p ,ir) 
extended to typical scales, leads to a systematic error of up 
to 10%. This is much larger than the statistical error of a 
few percent reachable by next-generation surveys. The larger 
systematic error is found for small bias objects, and de- 
creases reaching a minimum for halos of 10 13 fe -1 Mp. T his 
reinforces the trend observed bv lOkumura fc Jina (|2011h . 

• Additional analysis of mock surveys from the Millen- 
nium run confirm that the observed systematic errors are 
not the result of potentially missing sub-structure in the 
BASICC halo catalogues. 

• The use of the deprojected correlation function in- 
creases the statistical error, inducing also some additional 
systematic effects (details a re given in Appendix |B1 and also 
in the companion paper bv lMarulli et al.l (|2012T )'). 

• For highly biased objects, which are sparser and whose 
surveys typically cover larger, more linear scales, the simple 
Kaiser model describes fairly well the simulated data, with- 
out the need of the empirical damping term with one extra 
parameter accounting for non-linear motions. This results in 
smaller statistical errors. 

• We have derived a comprehensive scaling formula, 
Eq. (|20|l . to predict the precision (i.e. relative statistical er- 
ror) reachable on /3 as a function of survey para meters. This 
expre ssion improves on a previous attempt (|Guzzo et al.l 
120081 ). generalizing the prediction to a population of arbi- 
trary bias and properly describing the dependence on the 
number density. 

This formula can be useful to produce quite general and 
reliable forecasts for future surveyqj. One should in any case 
consider that there are a few implementation-specific factors 
that can modify the absolute values of the recovered rms er- 
rors. For example, these would depend on the range of scales 
over which t;(r p ,n) is fitted. The values obtained here refer 
to fits performed between r m i„ = 3 and r ma x = 35 h" 1 Mpc. 
This has been identified through several experiments as an 
optimal range to mi nimize statistic al and systematic errors 
for surveys this size (|Bianchill2010l ). Theoretically, one may 
find natural to push r max , or both r m i„ and r, nax to larger 
scales, as to (supposedly) reduce the weight of nonlinear 



5 For example, it has recently been used, in combination with 
a Fisher matrix analysis, to predict errors on the growth rate 
expected by the ESA Euclid spectroscopic survey [cf. Fig. 2. 5 of 



iLaureiis etahl J201lT )1 



scales. In practice, however, in both cases we see that ran- 
dom errors increase in amplitude (while the systematic error 
is not appreciably reduced). 

Similarly, one should also keep in mind that the formula 
is strictly valid for z = 1, i.e. the redshift where it has been 
calibrated. There is no obvious reason to expect the scaling 
laws among the different quantities (density, volume, bias) 
to depend significantly on the redshift. This is confirmed 
by a few preliminary measurements we performed on halo 
catalogues from the z — 0.25 snapshot of the BASICC. Con- 
versely ^ __thsjnagnitude_of_the errors may change, as shown, 
e.g., in lde la Torre fc Guzzd (|2012T ). We expect these effects 
to be described by a simple renormalization of the constant 
C. 

Finally, one may also consider that the standard devia- 
tions measured using the 27 sub-cubes could be underes- 
timated, if these are not fully independent. We minimize 
this by maximizing the size of each sub-cube, while hav- 
ing enough of them as to build a meaningful statistics. The 
side of each of the 27 sub-cubes used is in fact close to 
500 h' 1 Mpc, benefiting of the large size of the BASICC 
simulation. 

• We have compared the error estimations from our sim- 
ulations with idealized predictions based on the Fisher ma- 
trix approach, customarily implemented in Fourier space. 
We find a good agreement, but only when the Fisher compu- 
tation is limited to significantly large scales, i.e. k < k max = 
0.2 h Mpc -1 . When more non-linear scales are included (as 
an attempt to roughly match those actually involved in the 
fitting of £( r pj Ti") in configuration space), then the predicted 
errors become unrealistically small. This indicates that the 
usual convention of adopting k ma x ~ 0.2 h Mpc -1 for these 
kind of studies is well posed. On the other hand, it seems 
paradoxical that in this way with the two methods we are 
looking at different ranges of scales. The critical point clearly 
lies in the idealized nature of the Fisher matrix technique. 
When moving up with k max and thus adding more and more 
nonlinear scales, the Fisher technique simply accumulates 
signal and dramatically improves the predicted error, clearly 
unaware of the additional "noise" introduced by the break- 
down of linearity. On the other hand, if in the direct fit of 
£(r p ,7r) (or P(k,/j,)) one conversely considers a correspond- 
ing very linear range r > 2-K/k max ~ 30 h^ 1 Mpc, a poor fit 
is obtained, with much larger statistical errors than shown, 
e.g., in Fig. [S] There is no doubt that smaller, mildly non- 
linear scales at intermediate separations have necessarily to 
be included in the modelling if one aims at reaching per- 
cent statistical errors on measurements of f3 (or /). If one 
does this in the Fisher matrix, then the predicted errors are 
too small. The need to push our estimates to scales which 
are not fully linear will remain true even with surveys of 
the next generation, including tens of millions of galaxies 
over Gpc volumes, because that is where the clustering and 
distortion signals are (and will still be) the strongest. Of 
course, our parallel results on the amount of systematic er- 
rors that plague estimates based on the standard dispersion 
model also reinforce the evidence that better modelling of 
nonlinear effects is needed on these scales. The strong effort 
being spent in this direction gives some confidence that sig- 
nificant technical progress will happen in the coming years 
(see e .g. Kwan. Lewis, fc Linderll2012l ; Ide la Torre fc Guzzd 
I2012L and references therein). 
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In any case, this limited exploration suggests once more 
that forecasts based on the Fisher matrix approach, while 
giving useful guidelines evidence the error dependences, have 
to be treated with significant caution and possibly verified 
with more direct methods. Similar tension between Fisher 
and Monte Carlo fo recasts has been recently noticed by 
lHawken et~afl J2012I ). 

• Finally, in Appendix fS] we have also clarified which is 
the most unbiased form to be adopted for the likelihood 
when fitting models to the observed redshift-space correla- 
tion function, proposing a slightly different form with re- 
spect to previous works. 

With redshift-space distortions having emerged as 
probe of primary interest in current and future dark-energy- 
oriented galaxy surveys, the results presented here further 
stress the need for improved descriptions of non-linear ef- 
fects in clustering and dynamical analyses. On the other 
hand, they also indicate the importance of building surveys 
for which multiple tracers of RSD (with different bias val- 
ues) can be identified and used in combination to help un- 
derstanding and minimizing systematic errors. 
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APPENDIX A: DEFINITION OF THE 
LIKELIHOOD FUNCTION TO ESTIMATE (3 



To estimate j3, in Section ^. 2l we defined a likelihood function 
comparing the measured correlation function £,(r p ,n) and 
the corresponding parameterized models. Our likelihood is 
simply given by the standard \ 2 expression 

(Vij 



21n£ 



E- 



Vij) 






(Al) 



where however the stochastic variable considered is not just 
the value of f (r p , n) at each separation (r p , tv) = (n, rj), but 
the expression 



Vij = log[l + £(n,r,)] 



(A2) 



which has the desirable property of placing more weight 
on large, more linea r scales. This was first proposed by 
lHawkins et al.l (J2003J), who correspondingly adopt the fol- 
lowing expression for the expectation value of the variance 

4 = {log[l + & + Sfa)] - log[l + in - 5(£y)]} 2 . (A3) 

This simply maps onto the new variables yij, the interval 
including 68% of the distribution in the original variables 
£y, i.e. twice the standard deviation if this were Gaussian 
distributed. Strictly speaking, here an extra factor 1/2 would 
be formally required if one aims at defining the equivalent of 
a standard deviation, but this is in the end uneffective in the 
minimization and thus in finding the best-fitting parameters. 

However, the weighting factors l/<5y in the likelihood 
definition depend explicitly on £y, which may result in an 
improper weighting of the data when the correlation signal 
fluctuates near zero. We have directly verified that when the 
estimate is noisy, it is preferable to use a smooth weighting 
scheme rather than one that is sensitive to local random 
oscillations of £, which is more likely to yield biased es- 
timates. This supported our choice of adopting the usual 
sample- variance expression 

2 



Sij = 



Ji H (»«° - (to) 



(A4) 



estimated over N realization s of the survey. T his can be 
done using mock realizations (jGuzzo et al.l 120081 ). or, alter- 
natively, through appropriate jack-knife or booststrap re- 
samplings of the data. Specifically, we find a significant 
advantage of the weighting scheme based on sample vari- 
ance when dealing with low-density samples. This is shown 
in Figure IA11 where /3 is estimated on the catalogue with 
M cut = 1.10 x 10 12 h' 1 M Q using the two likelihoods and 
gradually diluting the sample (note that all computations 
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Figure Al. Mean value (top) and relative scatter (bottom) of 
/3, as recovered from catalogues with varying density (but same 
volume and bias), using the two different definitions of the vari- 
ance of each data point of Eqs. IA3I (open blue squares) and IA4I 
(open red circles). The dashed line shows as reference the asymp- 
totic common value of /3 that both methods identically recover 
at high densities. Note how using eq. IA4I yields an unbiased 
estimate down to significantly smaller densities, whereas the esti- 
mator based on Eq. dA3l l becomes rapidly more and more biased 
below n m 5 X 10~ 4 h? Mpc . The intrinsic scatter of the mea- 
surements, as usual obtained from the 27 sub-cubes of this specific 
catalogue, also follows a similar trend. 



in this section use the linear-exponential model, with £(r) 
directly measured in real-space). 

In order to understand the reasons behind this be- 
haviour, we have studied independently the various terms 
composing the likelihood. We use one single sub-cube (i.e. 
1/27 of the total volume), from the catalogue with M cu t — 
1.10 x 10 12 hT 1 Mq, and consider two extreme values of 
the mean density. First, we consider the case of the high- 
est density achievable by this halo catalogue, n = 3.11 x 
10 -3 h 3 Mpc -3 . In the upper panel of Figure lA2l we plot a 
section of £(r p ,ir) at constant it — 9.75 h" 1 Mpc, together 
with the model £ m (r p , it) corresponding to the best-fit /3 and 
C12 parameters. In this density regime the values of the re- 
covered best-fit parameters are essentially independent of 
the form chosen for 6^ (as shown by the coincident val- 
ues of P on the right side of Figure lAljl . The match of the 
model to the data is very good. In the central panel, we 
plot instead, for each bin i along r p , the absolute value of 
the difference between model and observation, (\y — j/m|)j, 
together with the corresponding standard deviations in the 
two cases, which are virtually indistinguishable from each 
other. Finally, the lower panel shows the full values of the 
terms contributing to the \ 2 sum, again showing the equiv- 
alence of the two choices in this density regime. 

However, when we sparsely sample the catalogue, as to 
reach a mean density of n — 9.58 x 10~ 5 h 3 Mpc -3 (leaving 
all other parameters unchanged), a very different behaviour 
emerges (Figure IA3[I 6 I . Using the Hawkins et al. definition 
for the variance yields a best-fit model that overestimates 



6 In Figure IA1I (upper panel, second blue square from the left) 
we show the same behaviour when averaged over 27 sub-samples. 
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Figure A2. Comparison of the performances of the two likeli- 
hood forms discussed in the text in the high-density regime, using 
the fully sampled population of halos from a single sub-cube (1/27 
of the volume) with M cut = 1.10 X 10 12 h' 1 M®. Top panel: cut- 
through £(r p ,ir) at fixed n = 9.75 h -1 Mpc (broken line), and 
corresponding best fit model fm(7>>l") using the Hawkins et al. 
form for the scatter of each data point (continuous line) . Central 
panel: residual values \yij — y™ 1 | between the data and model 
values (light grey line) and values for the scatter of each point, 
according to the two definitions of Eqs. IA4I (solid red line) and 
[A3] (dashed blue line). Bottom panel: corresponding terms in the 
X 2 sum (see Eq. dAlll ). The two definitions for the scatter, as 
expected, produce virtually identical values for the likelihood. 



the data on almost all scales (top panel), corresponding to 
unphysical values of /3 = 2.33 and CT12 = 2112 km s -1 . The 
central panel now shows how in this regime the two defini- 
tions of the scatter, (which weigh the data- model difference), 
behave in a significantly different way, with the Hawkins et 
al. definition being much less stable than the one used here, 
and in general anti-correlated with the values of £(r p , n) in 
the upper panel. In the lower panel, the dashed line shows 
how this anti-correlation smooths down the (\y — y m \)i peaks 
resulting in erroneously low values for the \ 2 that drive the 
fit to a wrong region of the parameter space. In the same 
panel, the solid line shows how the likelihood computed with 
our definition for these same parameters gives high x 2 val- 
ues, thus correctly rejecting the modejj. 



- 1 Mpc (and n = 9.75 h -1 Mpc) we find l+£- 
5(5) < 0. Consequently, 8 Hawkins ls n °t well defined (Figure I A3I 
central panel) resulting in a zero weight for the corresponding x 2 
summand (lower panel). 



Figure A3. Same as Figure IA2I but now in the low-density 
regime (n = 9.58 X 10 -5 h 3 Mpc ). Again, the model curve in 
the top panel corresponds to the best-fit parameters obtained 
using the Hawkins et al. form of the scatter of each measurements. 
The fit is very unsatisfactory. The bottom panel shows how the 
likelihood expression based instead on the standard deviation of 
y as from Eq. dA4b rejects these parameter values, giving high x 2 
values (red solid curve). Note the different scale on the ordinate, 
with respect to previous figure. 



APPENDIX B: ADDITIONAL SYSTEMATIC 
EFFECT WHEN USING THE DEPROJECTED 
CORRELATION FUNCTION 

In a real survey, the direct measurement of £(r) is not pos- 
sible. A way around this obstacle is to project £(r p , n) along 
the line of sight, i.e. along the direction affected by redshift 
distortions. We hence define the projected correlation func- 



tion as 

u) p (r p ) - 2 



£(r p , 7r)d7r = 2 



r'£(r')dr' 
^r 12 - rjj 



(Bl) 



Inverti ng the integral we recover £(r). More pr e cisely , fol- 
lowing ISaunders. Rowan-Robinson, fc Lawrence! |l992j) , we 
have 



ew 



dw p (r p )/dr p 



dr„ 



(B2) 



where ir is the usual mathematical constant, not to be con- 
fused with the line-of-sight separation it in Eq. (|B1[) . 

A more extended investigation of the effects arising 
when using the deprojected £(r) instead of that directly 
measur ed (hereafter £rf ep a nd £dir respectively) is carried 
out in iMarulli et al.l (|2012l ). Here we limit the discussion 
to the impact of the deprojection technique on the esti- 
mate of j3, as a function of the mass (i.e. the bias) of the 
adopted tracers, focussing on the systematic effects (Figure 
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Figure Bl. The effect of using the dc-projcctcd real-space cor- 
relation function in the RSD model. Upper panel: values of fj 
obtained when the real-space correlation function §(r) is directly 
measured from the simulation (triangles) or deprojected as in real 
surveys (rhombs and inverted triangles). The latter correspond to 
two different integration limits irmax in the projection. The two 
lower panels give ths systematic and statistical error as in Fig- 
ure [6] 



IBlj) . One possible source of systematic error in performing 
the de-projection is the necessity of defining a finite inte- 
gration limit Umax in Eq. (|B2|) . In Figure IB1I two differ- 
ent choices of n ma x are considered. We notice that these 
choices (purple inverted triangles and yellow rhombs) result 
in different slopes of /3 as a function of bias, which differ 
from the slope obtained using ^ir (green triangles). This 
is plausibly due to the fact that using a limiting ir max we 
are underestimating the integral (consider that £ > for 
7f < 100 h~ 1 Mpc). This effect grows when the bias increases, 
because of the corresponding growth of £ which leads to a 
larger "loss of power" in w p . However, we cannot use ar- 
bitrarily large values of n max because the statistical error 
increases for larger 7r ma:r (see lowest panel of Figure IB1I) . 
This may be due to the increase of the shot noise at large 
separations. Similarly, the drop of correlation signal at small 
separations due to the finite size of the dark matter halos 
produces an impact o n j3 which grows w ith bias. Finally, as 
suggested p reviously (IGuzzo et al.l 120081 ) and discussed ex- 
tensively in lMarulli et al.l ( 2012T ). Figure IBT1 shows how us- 
ing £de P in modelling RSD, produces a statistical error about 
twice as large as that obtained using ^ir (lower panel). 



This paper has been typeset from a TgX/ ffl?gX file prepared 
by the author. 
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